Skip to content
/ server Public

MDEV-38891 fixed THD read access violation #4694

Open
itzanway wants to merge 4857 commits intoMariaDB:10.6from
itzanway:main
Open

MDEV-38891 fixed THD read access violation #4694
itzanway wants to merge 4857 commits intoMariaDB:10.6from
itzanway:main

Conversation

@itzanway
Copy link

No description provided.

svoj and others added 30 commits January 12, 2026 23:35
Test output was affected by incompletely closed preceding connections.

Make test agnostic to concurrent connections by querying
information_schema.processlist only for connections that it uses.

Avoid ID-CONNECTION_ID() expression, it is not deterministic.
Test was affected by incompletely closed preceding connections.

Make test agnostic to concurrent connections by querying
performance_schema.threads only for connections that it uses.
Embedded mode produces different thread identifiers, which makes
replace_result replace unwanted occurences. Use temporary table
for filtering out unwanted threads such that replace_result is
not needed anymore.

This is regression after 0eebe3a.
…UND mismatch)

Test was affected by incompletely closed preceding connections.

Make test agnostic to concurrent connections by querying
performance_schema.threads only for connections that it uses.
Revert part of bead24b, which broke this test. The wait is still
needed to make variables of preceding connection disappear.
Test was affected by incompletely closed preceding connections.

Make test agnostic to concurrent connections by querying
performance_schema.user_variables_by_thread only for connections
that it uses.
Test was affected by incompletely closed preceding connections.

Make test agnostic to concurrent connections by querying
performance_schema.events_statements_current only for connections
that it uses.
Similar to MDEV-37483, use file name encoding for dbnames to create
directories. Adapt mariadb-import to convert the names back.
…ecution

Routine name resolution performed on a temporary memory root
during execution. So on the second execution LEX::spname members
pointed to a cleared memory.

Fixing to peform the resolution to parse time, like the CALL statement does.
During the execution time LEX::spname members now stay untouched.
Send ok packet earlier for SELECT queries that does not have any updates.
This is done in select_send::send_eof()

Axel's select benchmarks shows that this has a notable speed improvement:
1  Thread:  28% TPS speedup
8  Threads: 23%
64 Threads  6%

This was meassured with t_oltp_point_select With InnoDB running the client
over sockets.

Other things:
- Moved error reporting of LIMIT ROWS EXAMINED from handle_select() to
  check_limit_rows_examined(). This is to ensure that the error is
  reported before send_eof() is called.
- Removed duplicate "Query execution was interrupted" messages for the
  same query. killed_for_exceeding_limit_rows_warning_given was introduced
  for this purpose. We cannot use 'killed' flag in
  killed_for_exceeding_limit_rows() to detect if we already have produced
  a warning as the 'killed' in this case is reset for each union.

Reviewer: Sergei Golubchik <[email protected]>
correct the length check.
remove assertions that a file read from disk contains a specific substring
…e-level CREATE but not with global CREATE

CHECK TABLE was inconsistently requiring SELECT privilege on global/db level
or any privilege on the table/column level.

Change to require any table-applicable privilege on any level.
don't trust the content of a file read from disk
… queries to fail

don't copy field default values and check constraints in
CREATE ... SELECT.

CREATE ... SELECT means a table is created from a *result set*
not from some other table.

For backward compatibility, though, let's keep copying constant
default values and the "compressed" attribute.
…DEFAULT

VALUE() should only use table->insert_values when
table->insert_values contains row values.

table->insert_values gets row values for the ODKU clause
so if VALUE() is used before that it shouldn't use table->insert_values
…sions 10.11.X and beyond

my_getcputime() returns "cpu time in 1/10th on a microsecond (1e-7 s)"
otherwise it causes random failures in some later test that lists
files in $datadir/test
don't let the parser create ridiculously deep joins that
will be rejected later anyway
the "Test that bad value for plugin enum option is rejected correctly"
needed multiple fixes:

1. don't set plugin-dir based on $MYSQLTEST_VARDIR, all plugins are
  in var/plugins, but $MYSQLTEST_VARDIR is var/1/, var/2/, etc if
  --parallel is used (that is, practically always), thus the ha_example.so
  cannot be loaded, because cannot be found. Test fails with
  "unknown option --plugin-example-enum" as the plugin is not loaded

2. force --plugin-maturity=experimental, otherwise even if not parallel
   the plugin will fail to load because of low maturity, test still
   fails with "unknown option --plugin-example-enum"

3. don't specify .so extension explicitly otherwise the plugin still
   doesn't load on windows, even if paths and maturity are fixed

4. set --plugin-example=FORCE otherwise plugin fails to load after reading
   --plugin-example-enum-var=noexist because of unknown enum value,
   the server ignores the failure and starts normally. the test hangs.

5. This needs the fix in sql_plugin.cc to detect that the plugin is
   forced even when some options failed to parse. It used to consider
   plugin forced only if all options parsed correctly, which was wrong.

Now the test passes, testing what it was supposed to test - failure
to parse an enum value of a plugin option.

Without these fixes the test hanged as in 4 when run on the main branch
in non-prarallel (e.g. one test only) mode.
…keys on vault errors

* let use_cache_on_timeout apply to other errors
* enable use_cache_on_timeout by default and deprecate it
* increase cache_timeout to max and deprecate it
* change it from long to portable longlong
* delete both in 13.3
* put autocommit/commit outside of LOCK/UNLOCK.
* use uppercase like all other commands
* restore the old value of autocommit
with `ORDER BY number` if the number doesn't refer to a valid
result column, use this number in the error message not '???'.
restore OPTIMIZE/ANALYZE replication under @read_only
that was disabled in b62101f
apparently a file can be present in *more than one* rpm,
e.g. /usr/bin/dtrace on rhel10 is present *both* in
systemtap-sdt-dtrace and in systemtap-sdt-devel.

Make sure there's a separator between entries.
don't use such a greedy regex_replace pattern
event scheduler was printing a lot of info in [Note] in error log.

change to print its startup/shutdown notes only when log_warnings>0.
and runtime notes only when log_warnings>2.
one note was an abnormal error, change to [Error].
if ((res=item->val_str(str)) != str)

is incorrect way to detect whether res can be safely used,
because Item_char_typecast::val_str() can return res
which is different from str, but shares the same buffer.
itzanway and others added 6 commits February 24, 2026 22:28
Master_Server_Id was not cleared after CHANGE MASTER or RESET SLAVE,
showing a stale value until the slave reconnected. Reset master_id
and prev_master_id to 0 in both code paths.

The reset value (0) will be present in SHOW SLAVE STATUS until it is
re-evaluated to the id of a new connected master server.

Signed-off-by: Varun Deep Saini <[email protected]>
@grooverdan grooverdan changed the title fixed THD read access violation MDEV-38891 fixed THD read access violation Feb 25, 2026
@grooverdan grooverdan requested a review from vaintroub February 25, 2026 01:21
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 25, 2026
Copy link
Member

@vaintroub vaintroub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here some points.

  • there is a lot of unrelated whitespace changes. Please do not include them into the patch.

  • if THD can't be created, at startup, write a warning to error log (@dr-m can tell how to do it best for Innodb, but there is sql_print_warning), but do not create the timer, so that dict_stats_func is not going to be periodically called, if it can't do anything. You do not need to check if thd is NULL, you can DBUG_ASSERT() that it is not NULL.

  • I think , and ask @dr-m to confirm, that whatever Innodb is doing with THDVAR(thd, background_thread) = true; is unnecessary. While harmless at startup, and can bite at shutdown, in innodb_reset_background_thd() in ut_ad(THDVAR(thd, background_thread));. To check if it is not a foreground thread, thd->thread_id == 0 check is entirely sufficient, yes , server ensures that, yes, even then thread_ids wrap around, when 4 bytes ints are exhausted. I would remove that thing in fact, too. together with thd_proc_info(thd, name);, I have no idea what purpose does it serve, the background THDs are invisible from outside. I'd say, these wrappers around create_background_thd() and reset_thds() are obsolete, but would prefer to have @dr-m to confirm it.

  • While bug was reported in the main branch, bug it mostly likely exists in all previous versions. Which means, fix could need to be rebased to 10.6 or 10.11, or whatever @gkodinov will decide. As far as I can see it, this bug might not be of a huge importance. From my limited my understanding and very short investigation, is that it only happens when server was not going to start anyway, otherwise, in case of successful startup, dict_stats cleanup would happen earlier, at ha_preshutdown/innodb_preshutdown stage, but I'd again let @dr-m confirm that. UPDATE: ok, this can happen at any bootstrap, unlike normal startup, it does not run ha_preshutdown currently.

@vaintroub vaintroub requested a review from dr-m February 25, 2026 10:26
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that MDEV-38891 affects all currently maintained MariaDB Server branches. Therefore, a fix of this should target the 10.6 branch.

The real challenge is to reproduce the issue, possibly by injecting some sleep within DBUG_EXECUTE_IF. I would like to see a test case (even a nondeterministic one that fails once in N attempts) and some analysis of what is going on and how the fix is supposed to prevent the problem.

destroy_background_thd(dict_stats_thd);
dict_stats_thd = nullptr;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, our source code files end in a line terminator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My editor stripped it out by accident. I've added the missing newline terminator in the latest commit

Copy link
Author

@itzanway itzanway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I've updated the PR to address these points:

Cleaned up the unrelated whitespace changes.

Added a check in dict_stats_start(): if THD creation fails, it now logs a warning via sql_print_warning and skips creating the timer.

Replaced the if (!dict_stats_thd) check in dict_stats_func() with DBUG_ASSERT(dict_stats_thd != nullptr).

@itzanway
Copy link
Author

@vaintroub
I have added a test case using DBUG_EXECUTE_IF to reliably trigger the shutdown race condition, and verified it passes cleanly with this fix." Let me know if Buildbot gives you any trouble!

Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! This is a preliminary review.

There's already a very active review discussion: please keep working on that.

These are the extra points I see:

  • Since this is a crashing bug, I'd agree with Marko that this should be rebased to 10.6. Please rebase.
  • Please use a single commit with a commit message conforming to CODING_STANDARDS.md
  • Please make sure that buildbot runs pass. Right now some platforms don't even compile

@dr-m
Copy link
Contributor

dr-m commented Feb 26, 2026

@vaintroub, I think that you are right about THDVAR(thd, background_thread). First and foremost, I’d like to see a test case for reproducing the problem, ideally in mysql-test, such that without the fix, the test would at least occasionally fail. There are a few special cases around server startup and shutdown, depending on the value of innodb_fast_shutdown and whether InnoDB was started successfully or in some special mode, such as innodb_force_recovery or innodb_read_only or bootstrap.

@vaintroub
Copy link
Member

vaintroub commented Feb 26, 2026

I have seen it once after running test suite hundreds of times, and luckily got it a the debugger. it is innodb_bootstrap test, and I can't say anything more about it. I reported the error as community service, the rest, as for reproducibility, is up to Innodb team. The thing that is failing is bootstrap "mariadbd --bootstrap"

@itzanway itzanway requested a review from gkodinov February 27, 2026 10:31
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason you've re-requested review from me? I have nothing new to add and nothing in the PR has changed. So please work on the remarks in my previous review.

tonychen2001 and others added 2 commits March 3, 2026 14:10
Add support for reversed executable comments using /*!!version */ and
/*M!!version */ syntax. These execute the comment body only when the
server version is strictly less than the specified version, which is
the inverse of the existing /*!version */ syntax.

This enables writing portable SQL that uses newer syntax on new servers
while falling back to older syntax on older servers, e.g.:

  CREATE /*!100000 OR REPLACE */ TABLE /*!!100000 IF NOT EXISTS */ t1 ...

On MariaDB >= 10.0 this expands to CREATE OR REPLACE TABLE t1, while
on older versions it expands to CREATE TABLE IF NOT EXISTS t1.

Implementation: in lex_one_token(), after detecting a versioned comment
(/*! or /*M!), check for an additional '!' character. If present, invert
the version comparison so the comment body is expanded only when
MYSQL_VERSION_ID < version.

Tests added to main.comments and plugins.server_audit confirming:
- Reversed comments with version <= server version do not execute
- Reversed comments with version > server version execute
- MariaDB-specific /*M!! variant works correctly
- Reversed comments without a version number always execute
- Audit plugin correctly logs executed reversed comments
- Combined forward + reversed comments in a single statement

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
There is a lot of more work needed to make this code compatitble with
the rest of the MariaDB server.

- All std:: usage should be removed
- Most templatest should be removed
- Usage of <unordered_map>, <string_view>, <optional>, <unordered_set>
  should be removed and replaced with my_sys, sql_string functions

It would be nice of one could also have default values for all the
CHANGE MASTER variables in my.cnf.  This would allow one to have
much shorter CHANGE MASTER commands
@itzanway
Copy link
Author

itzanway commented Mar 5, 2026

@dr-m @vaintroub I have resolved the merge conflicts and ported this fix to the 10.6 branch.

I also addressed the C++ nitpicks:

Removed the redundant if check around delete dict_stats_timer.

Restored the missing newline terminator at the end of dict0stats_bg.cc.

Finally, I added a deterministic test case (mdev_38891.test) using DBUG_EXECUTE_IF. By injecting a 2-second sleep into dict_stats_func, we can simulate the exact race condition where the thread wakes up during early shutdown while the plugin system is being destroyed. The test reliably crashes an unpatched server and passes cleanly with this fix applied."

itzanway and others added 3 commits March 5, 2026 16:17
@itzanway itzanway changed the base branch from main to 10.6 March 11, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.